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ABSTRACT We use a free energy functional the- 
ory to elucidate general properties of heterogeneously 
ordering, fast folding proteins, and we test our con- 
clusions with lattice simulations. We find that both 
structural and energetic heterogeneity can lower the 
free energy barrier to folding. Correlating stronger 
contact energies with entropically likely contacts of a 
given native structure lowers the barrier, and anticor- 
relating the energies has the reverse effect. Designing 
in relatively mild energetic heterogeneity can elimi- 
nate the barrier completely at the transition temper- 
ature. Sequences with native energies tuned to fold 
uniformly, as well as sequences tuned to fold by a sin- 
gle or a few routes, are rare. Sequences with weak na- 
tive energetic heterogeneity are more common; their 
folding kinetics is more strongly determined by prop- 
erties of the native structure. Sequences with dif- 
ferent distributions of stability throughout the pro- 
tein may still be good folders to the same structure. 
A measure of folding route narrowness is introduced 
which correlates with rate, and which can give infor- 
mation about the intrinsic biases in ordering due to 
native topology. This theoretical framework allows us 
to systematically investigate the coupled effects of en- 
ergy and topology in protein folding, and to interpret 
recent experiments which investigate these effects. 

The energy landscape has been a central paradigm 
in understanding the physical principles behind the 
self-organization of biological molecules Jl)-^ . A cen- 
tral feature of landscapes of biomolecules which has 
emerged is that the process of evolution, in selecting 
for sequences that fold reliably to a stable confor- 
mation within a biologically relevant time, induces a 
new energy scale into the landscape [§-0. In addi- 
tion to the ruggedness energy scale already present 
in heteropolymers, it now has the overall topogra- 
phy of a funnel @,|-(10). A sequence with a funneled 
landscape has a low energy native state occupied with 
large Boltzmann weight at temperatures high enough 
that folding kinetics is not dominated by slow escape 
from individual traps. 



As an undesigned heteropolymer with a random, 
un-evolved sequence is cooled, it becomes trapped 
into one of many structurally different low energy 
states, similar to the phase transitions seen in spin 
glasses, glasses, and rubber. The low temperature 
states typically look like a snapshot of the high 
temperature collapsed states, but have dramatically 
slower dynamics. On the other hand, when a designed 
heteropolymer or protein is cooled, it reliably and 
quickly finds the dominant low energy structure(s) 
corresponding to the native state, in a manner simi- 
lar to the phase transition from the gas or liquid to 
the crystal state. As in crystals, the low temperature 
states typically have a lower symmetry group than 
the many high temperature states p"l| ]. Connections 
have been made between native structural symme- 
try and robustness to mutations of proteins [pl]-p3[ . 
Funnel topographies are maximized in atomic clusters 
when highly symmetric arrangements of the atoms 
are possible, as in van der Waals clusters with "magic 
numbers" [pr| , [l5|] , and similar arguments have been 
applied to proteins fil]| , where funneled landscapes 
are directly connected to mutational robustness |L6) . 

It is appealing to make the connection between 
symmetry and designability of native structures to 
the actual kinetics of the folding process, arguing that 
symmetry or uniformity in ordering the protein max- 
imizes the number of folding routes and thus the ease 
of finding a candidate folding nucleus, thus maximiz- 
ing the folding rate. Explicit signatures of multiple 
folding routes as predicted by the funnel theory Jlj], 
|l8| have been seen in simulations of well-designed 
proteins [||[l^-|2j| as well as experiments on several 
small globular proteins |f24|-|26||. However these fold- 
ing routes are not necessarily equivalent. There is an 
accumulating body of experimental |27]-|3l[] and sim- 
ulation p^,^2|-^2t evidence which show varying de- 
grees of heterogeneity in the ordering process. These 
data refine the funnel picture by focusing on which 
parts of the protein most effectively contribute to or- 
dering, and on the effects of native topology and na- 
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tive energy distribution on rates and stability. The 
ensemble of foldable sequences with a given ratio of 
T F /T G > 1 has a wide distribution of mean first pas- 
sage times [H [43|] , indicating that several other 
properties of the sequence and structure contribute to 
folding thermodynamics and kinetics. These include 
topological properties of the native structure [jllj [44| - 
p0| (e.g. mean loop length £, dispersion in loop length 
5£, and kinetic accessibility of the native structure), 
the distribution over contacts of total native energy 
in the protein, and the coupling of contact energetics 
with native topology. 

In this paper we integrate the above sundry ob- 
servations into a theory which explicitly accounts 
for native heterogeneity, structural and energetic, in 
the funnel picture. We introduce a simple field the- 
ory with a non-uniform order parameter to study 
fluctuations away from uniform ordering, through 
free energy functional methods introduced earlier by 
Wolynes and collaborators [] The theory is 

in agreement with simulations also performed in this 
paper. We organize the paper as follows. First we 
outline the calculation and results. Next we derive 
and use an approximate free energy functional which 
captures the essence of the problem. Then we con- 
clude and suggest future research, leaving technical 
aspects of the derivation for the methods section. 

OUTLINE. The free energy functional description 
in principle allows for a fairly complete understanding 
of the folding process for a particular sequence; this 
includes effects due to the three dimensional topolog- 
ical native structure, possible misfolded traps, and 
heterogeneity among the energies of native contacts. 
We model a well-designed, minimally frustrated pro- 
tein with an approximate functional, but many of the 
results we obtain are quite general. We find that for 
a well-designed protein, gains in loop entropy and/or 
core energy always dominate over losses in route en- 
tropy, so the thermodynamic folding barrier is always 
reduced by any preferential ordering in the protein. 

We treat only native couplings in detail, accounting for 
non-native interactions as a uniform background field. Addi- 
tionally, the correlation between contacts is a function 
only of the overall order Q in our theory. This is analo- 
gous to theHartree approximation in the one-electron theory 
of solids where electrons mutually interact only through 
an averaged field; extensions of our theory to include corre- 
lation mediated by native structure may be examined within 
the density-functional framework, and are a topic of future re- 
search. On the other hand, tests of the theory by simulation 
(fig. (1)) produce qualitatively the same results, so the conclu- 
sions are not effected by including correlations to any order. 

2 Folding heterogeneity effects the free energy in three ways: 
1.) The number of folding routes to the native state decreases; 
this effect increases the folding barrier, 2.) The conforma- 
tional entropy of polymer loops increases, since native cores 



However as long as ordering heterogeneity is not too 
large, there are still many folding routes to the na- 
tive structure, and the funnel picture is valid. When 
there are very few routes to the native state due to 
large preferential ordering, folding is slow and multi- 
exponential at temperatures where the native struc- 
ture is stable. In this scenario the rate is governed 
by the kinetic traps along the path induced, rather 
than the putative thermodynamic barrier which is ab- 
sent. Several physically motivated arguments giving 
the above results are described in the supplementary 
material. 

To analyze the effects of native energetic as well as 
structural heterogeneity on folding, we coarsely de- 
scribe the native structure through its distributions 
of native contact energies {e^} and native loop lengths 
{£i}. Here e, is the solvent averaged effective energy 
of contact i, and £{ is the sequence length pinched 
off by contact i. The labeling index i runs from 1 
to M, where M = zN is the total number of con- 
tacts, N is the length of the polymer, z the num- 
ber of contacts per residue. In the spirit of density 
functional theory of fluids (5^] we introduce a coarse- 
grained free energy functional F({Qi(Q)}\ {e,} , {£i}) 
approximating the physics of secondary (as e.g. along 
a helix) and tertiary (non-local) contacts in ordering. 
Q is defined as the overall fraction of native contacts 
made, used here to stratify the configurations with 
given similarity to the native state, since this parti- 
tioning results in a funnel topography of the energy 
landscape for designed sequences [|[ . The fraction 
of time contact % is made in the sub-ensemble of states 
at Q is Qi(Q). From a knowledge of this functional 
all relevant thermodynamic functions can in general 
be calculated such as transition state entropies and 
energies, barrier heights, and surface tensions. More- 
over, derivatives of the functional give the equilibrium 
distribution and correlation functions describing the 
microscopic structure of the inhomogeneous system, 
as we see below. 

Given all the contact energies and loop lengths 
{£i} for a protein, the thermal distribution of contact 
probabilities {Qi (Q)} is found by minimizing the free 
energy functional F({Qi (Q)} \ {ei} , {£i}) subject to 
the constraint that the average probability is Q, i.e. 
J2i Qi — MQ (Q parameterizes the values of the 
Q'iS) ^ Since in the model the probability of a con- 

with larger halo entropies are more strongly weighted. This de- 
creases the folding barrier 3.) Making likely contacts stronger 
in energy lowers the thermal energy of partially native struc- 
tures; this decreases the folding barrier. 

3 This procedure is analogous to finding the most probable 
distribution of occupation numbers, and thus the thermody- 
namics, by maximizing the microcanonical entropy for a sys- 
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tact to be formed is a function of its energy and loop 
length, we can next consider the minimized free en- 
ergy as a function of the contact energies for a given 
native topology: F({ei} | {£%})■ Then we can seek the 
special distribution of contact energies {e*{t-ij} that 
minimizes or maximizes the thermodynamic folding 
barrier to a particular structure by finding the ex- 
ternum of Ft({ei} | {£i}) with respect to the contact 
energies ej, subject to the constraint of fixed native 
energy, J2i e i = Me = E N . This distribution when 
substituted into the free energy gives in principle the 
extremum free energy barrier as a function of native 
structure F'({£i\), which might then be optimized 
for the fastest/slowest folding structure and its cor- 
responding barrier. We found that in fact the only 
distribution of energies for which the free energy was 
an extremum is in fact the distribution which maxi- 
mizes the barrier by tuning all the contact probabil- 
ities to the same value. 

METHODS. We derive an approximate free en- 
ergy functional, which takes account for ordering 
heterogeneity, starting from a contact Hamiltonian 
W({A a/3 }|{A^}) of the form 

n=Y J Kp^Kp + ^A aP (i - a*,)] (i) 

Here the double sum is over residue indices, A a p = 
1 (0) if residues a and (3 (do not) contact each 
other, A"^ = 1 (0) if these residues (do not) con- 
tact each other in the native configuration. The 
sum over native energies and non-native ener- 
gies e a /3 gives the energy for a particular configura- 
tion. Q To obtain the thermodynamics we proceed 
by obtaining the distribution of state energies in the 
microcanonical ensemble by averaging non-native in- 
teractions over a Gaussian distribution of variance 
b 2 : P(E\E N ,{A a pA» af3 }) = (5[E- H{A aP }]S[E N - 
W{A^}] ) n _ noi . The averaging results in a Gaus- 
sian distribution having mean ^i^iQi an d variance 
Mb 2 (l — Q), where Q, = A Qj gA^ counts native con- 
tacts present in the configuration state inside the 
stratum Q. From this distribution the log density 
of states is obtained in terms of the configurational 
entropy of stratum Q, S({Qi}\Q), and the free en- 
ergy functional F({Qi}\Q) obtained by performing 

tern of particles obeying a given occupation statistics - here 
the effective particles (the contacts) obey Fermi-Dirac statis- 
tics, c.f. eq. (Q). 

4 A similar derivation of the free energy for a uniform order 
parameter Q was calculated in ref. jlOj. 

5 This approach assumes minimal frustration, in that native 
heterogeneity is explicitly retained and non-native heterogene- 
ity is averaged over; phenomena specific to a particular set 
of non-native energies, e.g. "off-pathway" intermediates, are 
smoothed over in this procedure. 



the usual Legendre transform to the canonical en- 
semble (c.f. eq (||). 

We express the free energy in terms of an arbitrary 
distribution of contact probabilities - the distribu- 
tion of {Qi} that minimizes F ({Qi}\Q) is the (most 
probable) thermal distribution. [] For the ensemble 
of configurations at Q, we define the entropy that 
corresponds to the multiplicity of contact patterns as 
<5route({Qi}|Q) (> 0), and the configurational en- 
tropy lost from the coil state to induce a contact pat- 
tern {Qi} as S BON o{{Qi}\{£i} ,Q) (< 0). We make 
no capillarity or spinodal assumption, and treat the 
route entropy as the entropy of a binary fluid mix- 
ture [|l0||53|], modified by a prefactor X(Q) = 1 — Q a , 
which measures the number of combinatoric states 
reduced by chain topology: residues connected by 
a chain have less mixing entropy than if they were 
free f\ 

M 

Shoute = a (0) 2 \-Qi - (! - Qi) ln (i - Qi)] ■ 

(2) 

We introduce a measure of "routing'' TZ(Q) by ex- 
panding the entropy to lowest order Q S RO ute({<3 + 
SQi}) = iS"oute — A7£(<9)/2, where we have defined 
K(Q) by K(Q) = {SQ 2 )/{5Q 2 ) uax , which is the 
variance of contact probabilities normalized by the 
maximal variance, In the limit K(Q) = the 
uniformly ordering system has the maximal route 
entropy. When Qi = or 1 only, 1Z(Q) — 1, 
^route = 0, and only one route to the native state is 
allowed. 

In the supplementary material we derive a form 

6 Note that in eq. (0) we explicitly include the thermal trace 
over configurations at overall order Q. 

7 In the contact representation, the averaged bond occu- 
pation probabilities Qi = ( Qi ) TH are analogous to the av- 
eraged number density operator in an inhomogeneous fluid: 
<n(x)) TH = ( J^i 5 ( x i - x ) )th- 

8 The value a = 1.37 gives the best fit to the lattice 27-mer 
data for the route entropy, while a = 1.0 best fits the 27-mer 
free energy function. We generally use a = 1.0 since the 27- 
mer is small - for larger systems a is smaller: more polymer 
is buried and thus more strongly constrained by surrounding 
contacts. 

9 We avoid the word "pathway" since several definitions 
exist in the literature; here a single route is unambiguously 
defined through the limit Sroute — > 0. 

1(1 That is, if MQ contacts were made with probability 1 and 
M — MQ contacts were made with probability 0, then ( (Qi — 
Q) 2 >max = (l/M)(MQ(l-Q) 2 + (M - MQ)Q 2 ) = Q(l-Q). 
Thus H(Q) is between and 1. 

11 That is, since all Qi are only zero or one at any degree 
of nativeness, each successive bond added must always be the 
same one, so folding is then a random-walk on the potential de- 
fined by that single route (there is still chain entropy present). 
1Z(Q) is in the spirit of a Debye- Waller factor applied to folding 
routes. 
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for the configurational entropy loss to fold to a given 
topological structure by accounting for the distribu- 
tion of entropy losses to form bonds or contacts due 
to the distribution of sequence lengths in that struc- 
ture. We let the effective sequence (loop) length be- 
tween residues i and j, £ EFF 

(N ~ l\,Q) be a function 
of Q (this is a mean field approximation), and we 
take the entropy loss to close this loop to be of the 
Flory form ~ (3/2) ln(a/^ BFF ). The requirement that 
the entropy be a state function restricts the possible 
functional form of the effective loop length. The re- 
sult of the derivation for the contact entropy loss to 
form state {Qi} is 

^BOND = 

-(3/2)M((SQSln£) + S MF (Q,e)) (3) 

where (8Q8\n£) = (1/M)£i(<3i - Q)(ln£. ( - In?) 
is the correlation between the fluctuations in con- 
tact probability and log loop length, and S mF (Q,t) 
is the mean-field bond entropy loss (described in the 
supplement), and is a function only of Q and the 
mean loop length J. By eq. (|^) the entropy is raised 
above that of a symmetrically ordering system when 
shorter ranged contacts have higher probability to be 
formed; this effect lowers the barrier. Eq.s ([|), (|^), 
and (||) together give expression (||) for the free en- 
ergy F({Qi(Q)} \ {ei} ,{£i}) of a well-designed protein 
that orders heterogeneously. 

The lattice protein used in fig 1 to check the theory 
is a chain of 27 monomers constrained to the ver- 
tices of a 3-D cubic lattice. Details of the model 
and its behavior can be found in §, f2l| || [I§ 
ff3[ . Monomers have non-bonded contact interactions 
with a Go potential (native interactions only). ^| 
Coupling energies were chosen for row 1 of fig 1 by 
first running a simulated annealing algorithm to find 
the set {e*} that makes all the Qi({e*}) — at the 
barrier peak. Energies are always constrained to sum 
to a fixed total native energy: £V et = Me. Then 
energies were relaxed by letting e.; = e* + a(jl — e*). 
The values a = 1, 1.35, 2.05 were used in rows 2, 3, 
and 4 respectively. 

FREE ENERGY FUNCTIONAL. By averaging 
a contact Hamiltonian over non-native interactions, 
we can derive an approximate free energy functional 
for a well-designed protein (See the methods section). 
We analyze here heterogeneity in minimally frus- 
trated sequences, where the roughness energy scale 
b is smaller than the stability gap e. The general 



form of the free energy functional is 



12 Corner, crankshaft, and end moves are allowed. Free en- 
ergies and contact probabilities are obtained by e q uilibrium 
monte-carlo sampling using the histogram method j!3fl . Sam- 
pling error is < 5%. 
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F = (J2hQi-TS({Q i }\Q)}\ 

/t 



Mb 2 
~2T~ 



(i-O) 

(4) 

where Qi = (0, 1) counts native contacts in a config- 
urational state (so the sum on CiQi gives the states 
energy), summing S({Qi}\Q) gives the states config- 
urational entropy, and then this is thermally averaged 
over all states restricted to have MQ contacts. The 
second term accounts for low energy non-native traps. 

The study of the configurational entropy is a fas- 
cinating but complicated problem detailed in the 
methods section. In summary this entropy func- 
tional generalizes the Flory mean-field result |53|, |54|| 
to account for the topological heterogeneity inher- 
ent in the native structure and a finite average re- 
turn length for that structure (contact order j|7j ) , as 
well as to account for the number of folding routes 
to the native structure. The amount of route di- 
versity or narrowness in folding can be quantified in 
terms of the relative fluctuations of contact formation 
SQ = Q l {Q)-Q: 



K(Q) = (SQ 2 ) / (SQ 2 )^ 



(5) 



which is useful for our analysis below. Our resulting 
analytic expression for the free energy of a protein 
that folds heterogeneously is 



F F° 



Here F° 1F (Q) is the uniform- field free energy function 
(similar to that obtained previously in |l^l). The 
free energy functional is approximate in that it re- 
sults from an integration over a local free energy 
density whose only information about the surround- 
ing medium is through the average field present (Q), 
F = J^i fi(Qi>Q)- Cooperative entropic effects due 
to local correlations jl8],|5^] between contacts would 
be an important extension of the model, and have 
been treated elsewhere in similar models 0|. In- 
spection of eq. (||) shows that as heterogeneity in- 
creases, the effect on the barrier is a competition be- 
tween energetic and polymer entropy gains (2nd and 
4th terms) and route entropy losses (3rd term) as de- 
scribed above. 

Minimizing the free energy (^J) at fixed Q, S(F + 
/i Qj) — 0, gives a Fermi distribution for the most 

13 We have expanded the route entropy eq. (^) to second 
order in this expression for clarity; in deriving the results of 
the theory the full expression is used. 
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probable bond occupation probabilities {Q*} for a 
given {ti} and {£i}: 

Q*(Q) = 1/(1 + exp [((j,' + a - Tsi) /XT}) (7) 

where the Lagrange multiplier p,' ~ — (!/M)dF/dQ 
is related to the effective force on the potential F(Q). 
Positive second variation of F indicates the extremum 
is in fact a minimum. 

OPTIMIZING RATES, STABILITY, AND 
ENTROPY We now consider the effects on the free 
energy when the native interactions between residues 
are changed in a controlled manner. The theory pre- 
dicts a barrier at the transition temperature of a few 
k B T, in general agreement with experiments on small, 
single-domain proteins. The barrier height is fairly 
small compared to the total thermal energy of the 
system, reflecting the exchange of entropy for energy 
as the protein folds. However the barrier height can 
vary significantly depending on which parts of the 
protein are more stable in their local native struc- 
ture. At uniform stability we find the largest bar- 
rier (for a given total native energy): about twice as 
large as the barrier when stability is governed purely 
by the three-dimensional native structure, i.e. when 
all interaction energies are equal. Increasing hetero- 
geneity, by energetically favoring regions of the pro- 
tein which are already entropically likely to order, 
systematically decreases the barrier, and in fact can 
eliminate the barrier entirely if the heterogeneity is 
large enough. See figure 1. 

We seek to relax the values of {e^} at fixed native 
energy E N = J2j e j to the distribution {e* ({-^})} 
that extremizes the free energy barrier, by finding 
the solution of SJ^V^ — p]5e.i = for arbitrary 
and independent variations Set in the energies. It can 
be shown that SF/Sei = dF/dm + p(S/Sei)J2jQj> 
however the second term is zero since SQ/dei = 0, 
so by eq. (|J) SF/5ei = Qi. the contact probability 
plays the role of the local density, and the pertur- 
bation Sti the role of an external field, as in liquid 
state theory. At the extremum all contact probabil- 
ities are equal: Qi = p = Q\ which in our model 
means that longer loops have lower (stronger) ener- 
gies: Set = TSsi = —(3/2)T61n£i; there is full sym- 
metry in the ordering of the protein at the extremum. 
Evaluating the second derivative mechanical-stability 
matrix shows Qi = Q$ to be an unstable maximum: 

(S^ySe.Se,)^ = -Sij Q x (l - Q x ) /A*T . (8) 

This is clearly negative, meaning that tuning the en- 
ergies so that Qi = Q* maximizes the free energy 
at the barrier peak. Since the change in the un- 
folded state (at Q s=s 0) is much weaker than at the 



transition state, the barrier height itself is essentially 
maximized. Substituting eq. (|J) into a Taylor expan- 
sion of the free energy at the extremum (and using 
A* = A(Qt) w 1 - Qi) gives for the rate 



,exp(Q i MSe 2 /2T 2 



(9) 



which is to be compared with eq. (1) in the supple- 
mentary material (obtained by an argument using the 
random energy model) . In terms of the route narrow- 
ness measure 1Z(Q) the change in free energy barrier 
on perturbation is 



5AF$ = -(l/2)MX t TTZ(Q t ) 



(10) 



A variance in contact participations SQ 2 = 0.05 
which is about 20% of the maximal dispersion (« 
1/4, taking Q$ w 1/2) lowers the barrier by about 
0.1Nk B T or about 5k B T for a chain of length N « 50 
(believed to model a protein with ~ 100 aa ||). 

We can extend the analysis by perturbing about 
a structure with mean loop length J, and including 
effects on the barrier due to dispersion in loop length 
and correlations between energies and loop lengths. 
A perturbation expansion of the free energy gives to 
lowest order: 



5AFi 
M 



2T 



.8J 2 
f 



T ~Q tU ir-- A nt 



sese 



(11) 



indicating that the free energy barrier is addition- 
ally lowered by structural variance in loop lengths, 
and also when shorter range contacts become stronger 
energetically (5£i < and £e, < 0) or longer range 
contacts become weaker energetically (5£{ > and 
Set > 0) i.e. in the model the free energy is addi- 
tionally lowered when fluctuations are correlated so 
as to further increase the variance in contact partici- 
pations. This effect has been seen in experiments by 
the Serrano group [J46|, |5(| . 

To test the validity of the theory, we compare the 
analytical results obtained from our theory with the 
results from simulation of a 27-mer lattice protein 
model. The comparison is shown on figure 1 where a 
full analysis is performed. All energies are in units of 
the mean native interaction strength e. 

The rate dependence on heterogeneity should be 
experimentally testable by measuring the dependen- 
cies of folding rate at the transition temperature of 
a well-designed protein on the dispersion of ^-values. 
It is important that before and after the mutation(s) 
the protein remains fast-folding to the native struc- 
ture without "off-pathway" intermediates, and that 
its native state enthalpy remain approximately the 
same, perhaps by tuning environmental variables. 
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CONCLUSIONS AND FUTURE WORK. In 

this paper we have introduced refinement and insight 
into the funnel picture by considering heterogeneity 
in the folding of well-designed proteins. We have ex- 
plored in minimally frustrated sequences how folding 
is effected by heterogeneity in native contact energies, 
as well as the entropic heterogeneity inherent in fold- 
ing to a specific three-dimensional native structure. 
Specifically we examined the effects on the folding 
free energy barrier, distribution of participations in 
the transition state ensemble TSE', ^ as well as the 
diversity or narrowness of folding routes. For the en- 
semble of sequences having a given T F /T G , homoge- 
neously ordering sequences have the largest folding 
free energy barrier. For most structures, where topo- 
logical factors play an important role, this regime is 
achieved by introducing a large dispersion in the dis- 
tribution of native contact energies which in practice 
would be almost impossible to achieve. As we re- 
duce the dispersion in the contact energy distribu- 
tion to a uniform value e, the dispersion of contact 
participations increases and thus the number of fold- 
ing routes decreases, the free energy barrier decreases 
and the total configurational entropy at the TSE' ini- 
tially increases due to polymer halo effects. The fold- 
ing temperature is only mildly effected; the prefac- 
tor appearing in the rate is probably only mildly ef- 
fected also, since it is largely a function of T F /T G and 
polymer properties pl| . Tuning the interaction ener- 
gies further results in more probable contacts having 
stronger energy. Route diversity decreases to moder- 
ate values - there are still many routes to the native 
state, and T F /T G is still sufficiently greater than one. 
The barrier eventually decreases to zero, at relatively 
mild dispersion in native contact energy. The funnel 
picture, with different structural details, is valid for 
the above wide range of native contact energy dis- 
tributions. However, tuning the energies further so 
that probable contacts have even lower energy even- 
tually induces the system to take a single or very few 
folding routes at the transition temperature. A large 
dispersion of energies is required to achieve this, and 
in this regime the folding temperature drops well be- 
low the glass temperature range, where folding rates 
are extremely slow. 

Since fine tuning interactions on the funnel may ef- 
fect the rate, sequences may be designed to fold both 
faster or slower to the same structure of a wild type 
sequence, depending how the interaction strengths 
correlate with the entropic likelihood of contact for- 
mation. Folding rates in mutant proteins that exceed 



14 We use a prime since we actually look at the barrier peak 
along the Q coordinate. 



those of the wild type have been receiving much inter- 
est in recent experiments ^6| |39| . Enhancement 
(or suppression) of folding rate to a given structure 
due to changes in sequence are modeled in our theory 
through changes in native interactions; our results 
are fully supported by the experiments cited above. 
The fact that a minimally frustrated protein is ro- 
bust to perturbations in the interactions means that 
at least the folding scenarios depicted in the center 
2 rows of fig. 1 are feasible within the ensemble of 
sequences that fold to the given structure. However 
the number of sequences should be maximal when all 
the native interactions are near their average, and 
the actual width of the native interactions depends 
on the true potential energy function. Fluctuations 
in rate due to the weakening or strengthening of non- 
native traps by sequence perturbations is an interest- 
ing topic of future research. The enhancements or re- 
ductions in rate we have explored are mild compared 
to the enhancement by minimal frustration (funncl- 
ing the landscape): the fine tuning of rates may be 
a phenomenon manifested by in vitro or in machina 
evolution, rather than in vivo evolution. Nevertheless 
rate tuning and folding heterogeneity may become an 
important factor for larger proteins, where e.g. sta- 
bilizing partially native intermediates may increase 
the overall rate or prevent aggregation. Given that 
a sequence is minimally frustrated, heterogeneity or 
broken-ordering-symmetry in fact aids folding, sim- 
ilar to the enhancement of nucleation rates seen in 
other disordered media |3(| . Similar effects have been 
observed in Monte Carlo simulations of sequence evo- 
lution, when the selection criteria involves fast fold- 
ing rate [ |33| |. Here we see how such phenomena can 
arise from general considerations of the energy land- 
scape theory. The notion that rates increase with 
heterogeneity at little expense to native stability con- 
trasts with the view that non-uniform ordering exists 
merely as a residual signature of incomplete evolution 
to a uniformly folding state. Adjusting the backbone 
rigidity or the non-additivity of interactions ]l0|,|6l]] 
can also modify the barrier height, possibly as much 
as the effects we are considering here. There may 
also be functional reasons for non-uniform folding 
- malleability or rigidity requirements of the active 
site may inhibit or enhance its tendency to order. 
The amount of route narrowness in folding was in- 
troduced as a thermodynamic measure through the 
mean square fluctuations in a local order parame- 
ter. The route measure may be useful in quantify- 
ing the natural kinetic accessibility of various struc- 
tures. While structural heterogeneity is essentially 
always present, the flexibility inherent in the num- 



G 



bcr of letters of the sequence code limits the amount 
of native energetic heterogeneity possible. However 
some sequence flexibility is in fact required for fun- 
nel topographies [|32| and so is probably present at 
least to a limited degree. We have seen here how a 
very general theoretical framework can be introduced 
to explain and understand the effects of local hetero- 
geneity in native stability and structural topology on 
such quantities as folding rates, transition tempera- 
tures, and the degree of routing in the funnel folding 
mechanism. Such a theory should be a useful guide 
in interpreting and predicting experimental results on 
many fast- folding proteins. 

We thank Peter Wolynes, Hugh Nymeyer, Ce- 
cilia Clementi, and Chinlin Guo for their gener- 
ous and insightful discussions. This work was ini- 
tiated while Plotkin was a graduate student with 
Peter Wolynes. This work was supported by NSF 
Grant MCB9603839 and NSF Bio-Informatics fellow- 
ship DBI9974199. 
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CAPTION FOR FIG. 1: 

The effects of heterogeneity in contact probability (increased from top to bottom) on barrier height F$, 
folding temperature T F , and ordering heterogeneity are summarized here; plots are for simulations of a 27- 
mer lattice Go model (yellow) to the same native structure (given in ]2~H|), and for the analytic theory in 
the text (red). The simulation results make no assumptions on the nature of the configurational entropy; 
the theoretical results use the approximate state function of eq. (^) , along with a cutoff used for the shorter 
loops so the bond entropy loss for each loop is always < (the same loop length distribution as in the lattice 
structure is used). In the top row, energies are tuned for both simulation and theory to fully symmetrize the 
funnel: Qi(e*) = Q; Second row: energies are then relaxed for the simulation results so they are all equal: 
€i = e; energies in the theory are relaxed the same way until a comparable T F is achieved; Third row: energies 
are then further tuned to a distribution = e° that kills the barrier (there a many such distributions - all 
that is necessary is sufficient contact heterogeneity); The top 3 rows are funneled folding mechanisms with 
many routes to the native structure. Last row: energies are tuned to induce a single or a few specific routes 
for folding. All the while the energies are constrained to sum to E N : £\ e i — -^n- The free energy profile 
F(Q) (in units of e) is plotted in the left column at the folding transition temperature T F , which is given. 
The next column shows the distribution of thermodynamic contact probabilities Qi(Q$) = <f>' at the barrier 
peak (we use the notation cj)' since this is a thermodynamic rather than kinetic measurement, however for 
well-designed proteins the two are strongly correlated with coefficient w 0.85 |)42|| ). Only simulation results 
are shown to keep the figure easy to read; the theory gives <j)' distributions within ~ 10% as may be inferred 
from their similar route measures. The next column shows the route measure K(Q) of eq. (§) and gives 
the dispersion in native energies required to induce the scenario of that row (72.(0,1) = 0/0 is undefined 
and so is omitted from the simulation plots; it is defined in the theory through the limit Q — > 0, 1). The 
right column shows schematically the different folding routes as heterogeneity is increased; from a maximum 
number of routes through to essentially just one route. TOP ROW: In the uniformly ordering funnel 
we can see first that P (</>') is a delta function and 1Z(Q$) = (c.f. eq. (||)), so ordering at the transition 
state (or barrier peak Q') is essentially homogeneous. The number of routes through the bottleneck (c.f. 
eq. (Q)) is maximized, as schematically drawn on the right. Branches are drawn in the routes to illustrate 
the minimum of TZ(Q) at Q$. The free energy barrier is maximized (eq. ([lO])), thus the stability of the 
native state at fixed temperature and native energy is maximized, and so the folding temperature T F at 
fixed native energy is maximized. T F in the simulation is defined as the temperature where the native state 
(Q = 1) is occupied 50% of the time. In the theory, at T F the probability for Q > 0.8 is 0.5. A very large 
dispersion in energies is required to induce this scenario; some contact energies are nearly zero, others are 
several times stronger than the average. SECOND ROW: In the uniform native energy funnel the barrier 
height is roughly halved while hardly changing T F , for the following reason. In a Go model, as the contact 
energies are relaxed from {e*} to a uniform value e, = e, the energy of the transition state is essentially 
constant: initially the energy is ^ Qi(Q^) € * = QJ2i e * = QE N , and as the contact energies are relaxed to 
a uniform value J2i Qi^ — Qi — QE^ once again. However the transition state entropy increases and 
obtains its maximal value when ej = e, because then all microstates at are equally probable since the 
probability to occupy a microstate is pi ~ exp(—E i (Q j; ) /T) = exp(—QE N /T)/Z = l/f2(Q+). The thermal 
entropy — YliPi^°SPi then equals the configurational entropy logfl(Q') (its largest possible value). Thus 
as contact energies are relaxed from e* where they are anti-correlated to their loop lengths (more negative 
energies tend to be required for longer loops to have equal free energies) to e where they are uncorrelated to 
their loop lengths, the barrier initially decreases because the total entropy of the bottleneck increases (drawn 
schematically on the right), i.e. increases in polymer halo entropy are more important than decreases in 
route entropy. The system is still sufficiently two-state that T F is hardly changed. P{4>') is broad indicating 
inhomogencity in the transition state, due solely in this scenario to the topology of the native structure since 
all contacts are equivalent energetically; Routing is more pronounced - when €4 — e, 1Z(Q) is measure of 
the intrinsic fluctuations in order due to the natural inhomogeneity present in the native structure; different 
structures will have different profiles and it will be interesting to see how this measure of structure couples 
with thermodynamics and kinetics of folding. Loops and dead ends in the schematic drawings are used to 
illustrate local decreases and increases in 1Z(Q); these fluctuations are captured by the theory only when the 
routing becomes pronounced (last row) . The solid curves presented for the theory are shown for a reduction 
in T-p comparable to the simulations. There is still some energetic heterogeneity present as indicated. When 
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ej = e in the theory (dashed curves), the fluctuations in Qi are somewhat larger than the simulation values, 
and the entropic heterogeneity is sufficient to kill the barrier- the free energy is downhill at T F = 0.5e. 
The free energy barrier results from a cancellation of large terms and is significantly more sensitive than 
intensive parameters such as route measure 1Z(Q). THIRD ROW: In approaching the zero-barrier funnel 
scenario for the simulation, the energies are further perturbed and now begin to anti-correlate with contact 
probability (and tend to correlate with loop length); i.e. more probable contacts (which tend to have shorter 
loops) have stronger energies. For the theory not as much heterogeneity is required. Contact energies are 
still correlated with formation probability as indicated by the signs in parentheses. The free energy barrier 
continues to decrease until some set of energies {e°} where the barrier at T F vanishes entirely. All the while 
the transition temperature T F decreases only ~ 10%, so that slowing of dynamics (as T F approaches T G ) 
would not be a major factor. At this point the </>' distribution at the barrier position Q*(e) is essentially 
bi-modal, but the distribution at Q*({e°}) (inset) is less so because of transition state drift towards lower 
Q values (the Hammond effect). A relatively small amount of energetic heterogeneity is needed to kill the 
barrier at T F . There are still many routes to the native state since 1Z(Q$) « 0.3 — 0.4, but some contacts 
are fully formed in the transition state (some </>' = 1). BOTTOM ROW: As the energies continue to be 
perturbed to values that cause folding to occur by a single dominant route rather than a funnel mechanism, 
folding becomes strongly downhill at the transition temperature, which drops more sharply towards T G : here 
to induce a single pathway T F must be decreased to about 1/4 the putative estimate of T G (about T F ({e})/1.6, 
see [§). In this scenario, the actual shape of the free energy profile depends strongly on which route the 
system is tuned to; Non-native interactions not included here become important. Contact participation at 
the barrier is essentially one or zero, and the route measure at the barrier is essentially one. The entropy 
at the bottleneck is relatively small (the halo entropy of a single native core). The energetic heterogeneity 
necessary to achieve this scenario is again very large - comparable to what is needed to achieve a uniform 
funnel. 



10 



References 

Onuchic, J. N, Luthey-Schulten, Z, & Wolynes, 
P. G. (1997) Annu Rev Phys Chem 48, 545-600. 

Dill, K. A & Chan, H. S. (1997) Nat. Struct. 
Biol. 4, 10-19. 

Veitshans, T, Klimov, D, & Thirumalai, D. 
(1997) Folding and Design 2, 1-22. 

Gruebele, M. (1999) Annu Rev Phys Chem 50, 
485-516. 

Bryngelson, J. D & Wolynes, P. G. (1987) Proc 
Nat Acad Sci USA 84, 7524-7528. 

Goldstein, R. A, Luthey-Schulten, Z. A, & 
Wolynes, P. G. (1992) Proc Nat Acad Sci USA 
89, 4918-4922. 

Shakhnovich, E. I & Gutin, A. M. (1993) Proc 
Nat Acad Sci USA 90, 7195-7199. 

Leopold, P. E, Montal, M, & Onuchic, J. N. 
(1992) Proc Nat Acad Sci USA 89, 8721-8725. 

Onuchic, J. N, Wolynes, P. G, Luthey-Schulten, 
Z, & Socci, N. D. (1995) Proc Nat Acad Sci USA 

92, 3626-3630. 

Plotkin, S. S, Wang, J, & Wolynes, P. G. (1997) 
J Chem Phys 106, 2932-2948. 

Wolynes, P. G. (1996) Proc Nat Acad Sci USA 

93, 14249-14255. 

Li, H, Helling, R, Tang, C, & Wingreen, N. 

(1996) Science 273, 666-669. 

Nelson, E. D, Tcncyck, L. F, & Onuchic, J. N. 

(1997) Phys Rev Lett 79, 3534-3537. 

Wales, D. J & Scheraga, H. A. (1999) Science 
285, 1368-1372. 

Ball, K. D, Berry, R. S, Kunz, R. E, Li, F. Y, 
Proykova, A. A, & Wales, D. J. (1996) Science 
271, 963-966. 

Pande, V. S, Grosberg, A. Y, & Tanaka, T. 
(1995) J Chem Phys 103, 9482-9491. 

Bryngelson, J. D & Wolynes, P. G. (1989) J 
Phys Chem 93, 6902-6915. 

Bryngelson, J. D, Onuchic, J. N, Socci, N. D, & 
Wolynes, P. G. (1995) Proteins 21, 167-195. 



[19] Sali, A, Shakhnovich, E, & Karplus, M. (1994) 
Nature 369, 248-251. 

[20] Boczko, E. M & Brooks, C. L. (1995) Science 
269, 393-396. 

[21] Socci, N. D, Onuchic, J. N, & Wolynes, P. G. 

(1996) J Chem Phys 104, 5860-5868. 

[22] Lazaridis, T & Karplus, M. (1997) Science 278, 
1928-1931. 

[23] Pande, V. S & Rokhsar, D. S. (1999) Proc Nat 
Acad Sci USA 96, 1273-1278. 

[24] Burton, R. E, Huang, G. S, Daugherty, M. A, 
Calderone, T, & Oas, T. G. (1997) Nature Struct 
Biol 4, 305-310. 

[25] Oliveberg, M, Tan, Y, Silow, M, & Fersht, A. 
(1998) J Mol Biol 277, 933-943. 

[26] Goldbcck, R. A, Thomas, Y. G, Chen, E, Ex- 
querra, R. M, & Kligar, D. S. (1999) Proc Nat 
Acad Sci USA 96, 2782-2787. 

[27] Fersht, A. R, Matouschek, A, & Serrano, L. 
(1992) J Mol Biol 224, 771-782. 

[28] Radford, S. A, Dobson, M, & Evans, P. A. (1992) 
Nature 358, 302-307. 

[29] Bai, Y, Sosnick, T. R, Mayne, L, & Englander, 
S. W. (1995) Science 269, 192-197. 

[30] Martinez, J. C, Pisabarro, M. T, & Serrano, L. 
(1998) Nature Struct Biol 5, 721-729. 

[31] Grantcharova, V. P, Santiago, J. V, Baker, D, 
& Riddle, D. S. (1998) Nature Struct Biol 5, 
714-720. 

[32] Abkevich, V. I, Gutin, A. M, & Shakhnovich, 
E. I. (1994) Biochemistry 33, 10026-10036. 

[33] Gutin, A. M, Abkevich, V. I, & Shakhnovich, 
E. I. (1995) Proc Nat Acad Sci USA 92, 1282- 
1286. 

[34] Panchenko, A. R, Luthey-Schulten, Z, & 
Wolynes, P. G. (1996) Proc Nat Acad Sci USA 
93, 2008-2013. 

[35] Onuchic, J. N, Socci, N. D, Luthey-Schulten, Z, 
& Wolynes, P. G. (1996) Folding and Design 1, 
441-450. 

[36] Shoemaker, B. A, Wang, J, & Wolynes, P. G. 

(1997) Proc. Nat. Acad. Sci. USA 94, 777-782. 



11 



[37] Portman, J. J, Takada, S, & Wolyncs, P. G. 

(1998) Phys Rev Lett 81, 5237-5240. 

[38] Klimov, D & Thirumalai, D. (1998) J Mol Biol 
282, 471-492. 

[39] Sheincrman, F. B & Brooks, C. L. (1998) Proc 
Nat Acad Sci USA 95, 1562-1567. 

[40] Micheletti, C, Banavar, J. R, Maritan, A, & 
Scno, F. (1999) Phys Rev Lett 82, 3372-3375. 

[41] Shea, J. E, Onuchic, J. N, & Brooks, C. L. (1999) 
Proc Nat Acad Sci USA 96, 12512-12517. 

[42] Nymcyer, H, Socci, N. D, & Onuchic, J. N. 
(2000) Proc Nat Acad Sci USA 97, 634-639. 

[43] Socci, N. D & Onuchic, J. N. (1995) J Chem 
Phys 103, 4732-4744. 

[44] Abkevich, V. I, Gutin, A. M, & Shakhnovich, 
E. I. (1995) J Mol Biol 252, 460-471. 

[45] Betancourt, M. R & Onuchic, J. N. (1995) J 
Chem Phys 103, 773-787. 

[46] Viguera, A. R, Villcgas, V, Aviles, F. X, & Ser- 
rano, L. (1996) Folding and Design 2, 23-33. 

[47] Plaxco, K. W, Simons, K. T, & Baker, D. (1998) 
J Mol Biol 277, 985-994. 

[48] Shoemaker, B. A, Wang, J, & Wolynes, P. G. 

(1999) J Mol Biol 287, 675-694. 

[49] Aim, E & Baker, D. (1999) Proc Nat Acad Sci 
USA 96, 11305-11310. 

[50] Munoz, V & Eaton, W. A. (1999) Proc Nat Acad 
Sci USA 96, 11311-11316. 

[51] Anderson, P. W. (1992) Concepts in solids. 
(Addison- Wesley, Reading, Massachusets) . 

[52] Percus, J. K. (1982) in The liquid state of matter: 
Fluids, simple and complex, eds. Montroll, E & 
Lebowitz, J. (North-Holland, Amsterdam). 

[53] Plotkin, S. S, Wang, J, & Wolynes, P. G. (1996) 
Phys Rev E 53, 6271-6296. 

[54] Flory, P. J. (1956) J Am Chem Soc 78, 5222- 
5235. 

[55] Dill, K. A, Fiebig, K. M, & Chan, H. S. (1993) 
Proc Nat Acad Sci USA 90, 1942-1946. 

[56] Munoz, V & Serrano, L. (1996) Folding and De- 
sign 1, R71-R77. 



[57] Hagcn, S. J, Hofrichtcr, J. A, Szabo, A, & Eaton, 
W. A. (1996) Proc Nat Acad Sci USA 93, 11615- 
11617. 

[58] Kim, D. E, Gu, H, & Baker, D. (1998) Proc Nat 
Acad Sci USA 95, 4982-4986. 

[59] Brown, B. M & Sauer, R. T. (1999) Proc Nat 
Acad Sci USA 96, 1983-1988. 

[60] Karpov, V. G & Oxtoby, D. W. (1996) Phys Rev 
B 54, 9734-9745. 

[61] Kolinski, A, Godzik, A, & Skolnick, J. (1993) J 
Chem Phys 98, 7420-7433. 

[62] Wolyncs, P. G. (1997) Nature Struct Biol 4, 871- 
874. 



12 



